Weighing Weights

Ernesto

February 1, 2018

Have you ever…

Have you ever…

Model Selection

The Problem

  • You have:
    • one realization
    • different data sources
    • coded multiple competing hypothesis
  • You want to know
    • Which one is right?

Naive Error Minimisation

  • Compute summary statistics
    • Real data \(\theta\)
    • Model \(\hat \theta\)
  • Pick model which minimizes difference in summary statistics: \[ \min \sum | \theta-\hat \theta| \]
  • Weighing:
    • Correlation
    • Variance
    • Units

Data Available - Visit Counts

Summary Statistics - 1

Data Available - Logbook

Summary Statistics - 2

  • Fit Random Utility Model \[ \text{Pr}(\text{Choice}=1) = \frac{e^{\beta_i x_i}}{\sum e^{\beta_j x_j}} \]
  • Fit to data by logit
  • The \(\beta\) are your summary statistics
  • Classic indirect inference

Data Available - Aggregate Information

Indicator Value
Total Fish Landed \(\dots\)
Effort \(\dots\)
Average Distance Travelled \(\dots\)
# of Trips \(\dots\)
Average Hours Out \(\dots\)
Profits Made \(\dots\)

Summary Statistics - Model and Reality

Fishery Example

  • 9 hypotheses:
    • each represents a different behaviour algorithm
  • 3 scenarios:
    • geographical/technical differences

Fishery Naive Error

Fishery Naive Error 2

The Solution

  1. Generate summary statistics \(\theta_1,\dots,\theta_n\)
  2. Compute distance from data as weighted distance between summary statistics \[ \sum w_i \Delta_{\theta_i} \] \[ f(\Delta_{\theta_1},\dots,\Delta_{\theta_n}) \]
  3. Tune \(w_i\) to maximize model selection success in training data-sets

Build Training Data

  • If we generate data with one model, can we select it back by minimizing naive error?
  • For each hypothesis:
    • generate 100 test-cases
    • for each test case:
      • generate one run for each hypothesis
      • compute summary statistics distances
      • pick model which minimizes it

Training Data

Hypothesis \(\theta_1\) \(\theta_2\) \(\theta_3\)
“reality” \(\dots\) \(\dots\) \(\dots\)
A \(\dots\) \(\dots\) \(\dots\)
B \(\dots\) \(\dots\) \(\dots\)
C \(\dots\) \(\dots\) \(\dots\)

Training Data - 2

Hypothesis \(\Delta_{\theta_1}\) \(\Delta_{\theta_2}\) \(\Delta_{\theta_3}\) Correct
“reality” 0 0 0 -
A \(\dots\) \(\dots\) \(\dots\) YES
B \(\dots\) \(\dots\) \(\dots\) NO
C \(\dots\) \(\dots\) \(\dots\) NO

Training Data - 3

  • Turn this into a classifier problem
  • Find function predicting: \[ f(\Delta_{\theta_1},\dots,\Delta_{\theta_n}) \to \text{Pr}(\text{Correct}) \]
  • Pick hypothesis by \[ \max f(\Delta_{\theta_1},\dots,\Delta_{\theta_n})\]

Results

Parameter Errors

  • Parameter uncertainties
  • If we get some parameters wrong, can we still pick the correct model?
  • Biology mis-specification
  • Add mistakes to your training set

Parameter Errors - Results

Calibration

  • Sometimes you don’t have discrete hypotheses
  • Continuous parameters \(x\) you want to minimise \[ \arg_x \min \sum w_i \Delta_{\theta_i}(x) \]

1D Calibration Example

The Problem

\[ \arg_x \min \sum w_i \Delta_{\theta_i}(x) \] * Can we change \(w\) to make minimization easier/better?

What we would like?

The Solution

  • Generate many examples in pairs, varying parameters \(x\)
  • Solve as a regression problem

Training Data - 1

Parameter \(\theta_1\) \(\theta_2\) \(\theta_3\)
\(x_1\) \(\dots\) \(\dots\) \(\dots\)
\(x_2\) \(\dots\) \(\dots\) \(\dots\)
\(x_3\) \(\dots\) \(\dots\) \(\dots\)
\(x_4\) \(\dots\) \(\dots\) \(\dots\)

Training Data - 2

Parameter \(\Delta_{theta_1}\) \(\Delta_{\theta_2}\) \(\Delta_{\theta_3}\) \(\Delta_x\)
\(x_1\) 0 0 0 -
\(x_2\) \(\dots\) \(\dots\) \(\dots\) \(x_2 - x_1\)
\(x_3\) 0 0 0 -
\(x_4\) \(\dots\) \(\dots\) \(\dots\) \(x_4 - x_3\)

Regression Problem

  • Try to predict \(\Delta_x\) by looking at difference in summary statistics \[ \Delta_x \sim g(\Delta_{\theta_1},\dots,\Delta_{\theta_n}) \]

The Result

In practice

In pratice - 2

target Unweighted Weighted
0.2 0.2479998 0.2172167
0.3 0.6849485 0.3150765
0.4 0.2213021 0.4239574
0.5 0.7106011 0.4171167
0.6 0.4584458 0.6486549
0.7 0.3571657 0.7446145
0.8 0.8427593 0.8106461